Spatial modelling etc

David L Miller & Mark V Bravington

International Whaling Commission Scientific Committee 2017

Why are we here/why did we do this?

  • Stratified Horvitz-Thompson is workhorse of many abundance estimates
  • How is H-T going to fail?
  • When do we need to use spatial models?
  • What are “Bad surveys”?
  • Report http://converged.yt/papers/iwc-2017-ht.pdf

Overview

  • Today:
    1. what's wrong with H-T?
    2. Spatial models overview
    3. What can go wrong with spatial models?
    4. Testing designs in R
    5. Model checking for spatial models
  • Tomorrow:
    1. What we missed, what is hard
    2. Try out tester on your data
    3. Other methods/software, future work
    4. Guidelines

Practicalities

  • Try not to talk for more than an hour without a break
  • We both have funny accents, yell if you don't understand!
  • There is maths – don't worry

This is not a distance sampling course!

  • This material usually takes 4 days+ to teach
  • This will not prepare you to analyse spatial data
  • BUT you can do this in St Andrews this summer!
  • creem2.st-andrews.ac.uk

Why are we interested in spatially-explicit estimation?

Inferential aims

Part I

Horvitz-Thompson estimation: the good, the bad and the ugly

Horvitz-Thompson-like estimators

  • Rescale the (flat) density and extrapolate

\[ \hat{N} = \frac{\text{study area}}{\text{covered area}}\sum_{i=1}^n \frac{s_i}{\hat{p}_i} \]

  • \( s_i \) are group/cluster sizes
  • \( \hat{p}_i \) is the detection probability (from distance sampling)

Variance of H-T

  • Multiple sources of randomness in H-T equation:
    • \( \hat{p}_i \) - detectability
    • \( n \) - dealt with as \( n/L \), encounter rate
    • \( s \) - group size

Hidden in this formula is a simple assumption

  • Probability of sampling every point in the study area is equal
  • Is this true? Sometimes.
  • If (and only if) the design is randomised

Many faces of randomisation

plot of chunk randomisation

What does this randomisation give us?

  • Coverage probability
  • H-T estimator assumes even coverage
  • (or you can estimate)
  • Otherwise not really valid

Estimating coverage

  • We can estimate coverage of a non-uniform design!
  • In Distance!
  • Example from BC, Canada in this paper:

Thomas paper

Estimating coverage

A complex survey plan

  • Thomas, Williams and Sandilands (2007)
  • Different areas require different strategies
  • Zig-zags, parallel lines, census
  • Analysis in Distance

Sideline: alternative terminology

“A design is an algorithm for laying down samplers in the survey area”

“A realization (from that algorithm) is called a survey plan”

Len Thomas (Talk @CREEM 2004)

H-T estimation again

  • Can't estimate w/ H-T w/o coverage
  • “Fixed” “designs” violate assumptions
    • Some animals have \( \mathbb{P}(\text{included})=0 \)
  • “Deteriorate” pooling robustness property
  • What can we do?

More on variance

  • Encounter rate variance \( \approx n_j/l_j - n/L \)
  • Within-transect variation can be bad
    • e.g., N-S transect, N-S density gradient

Stratification

  • If we suspect density change can stratify!
  • Pre or post hoc (spatial and non-spatial)

I am going to stop talking very soon

Summary

  • H-T is a spatial model (sort of)
  • Violated an assumption if no randomness
    • Hard to assess how bad this is
  • Fewster et al (2009) and Fewster (2011) give variance approaches

Part II

Spatial models

Spatial models of distance sampling data

  • Collect spatially referenced data
  • Why not make spatially-explicit models?
  • Go beyond stratified estimates
  • Relate environmental covariates to counts

This is the rosy picture talk

We'll talk about the grim reality later

Example data in this talk

Sperm whales off the US east coast

  • Hang out near canyons, eat squid
  • Surveys in 2004, US east coast
  • Combination of data from 2 NOAA cruises
  • Thanks to Debi Palka, Lance Garrison for data. Jason Roberts for data prep.

Example data

Model formulation

  • Pure spatial, pure environmental, mixed?
  • May have some prior knowledge
    • Biology/ecology
  • What are drivers of distribution?
  • Inferential aim
    • Abundance
    • Ecology

Density surface models

Hedley and Buckland (2004)

Miller et al. (2013)

DSM process flow diagram

  • Ignoring group size (more on that tomorrow)

How do we model that?

SPOILER ALERT: your model is probably just a very fancy GLM

Generalised additive models (in 1 slide)

Taking the previous example…

\[ \mathbb{E}\left(n_j\right) = \color{red}{A_j}\color{blue}{\hat{p}_j} \color{green}{\exp}\left[\color{grey}{ \beta_0 + \sum_k s_k(z_{kj})} \right] \]

\( n_j\sim \) some count distribution

  • \( \color{red}{\text{area of segment}} \)
  • \( \color{blue}{\text{probability of detection in segment}} \)
  • \( \color{green}{\text{(inverse) link function}} \)
  • \( \color{grey}{\text{model terms}} \)

What about those s thingys?

Covariates

  • space, time, environmental (remotely sensed?) data

plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1

plot of chunk seconddsm

Modelling smooths

  • 1-dimension: not much difference
  • 2D more tricky
    • edge effects
    • tricky boundaries
    • more tomorrow
  • Now going to do some maths…
    • (ignore at will)

How do we build them?

plot of chunk unnamed-chunk-2

  • Functions made of other, simpler functions
  • Basis functions, \( b_k \)
  • Estimate \( \beta_k \)
  • \( s(x) = \sum_{k=1}^K \beta_k b_k(x) \)

Straight lines vs. interpolation

plot of chunk wiggles

  • Want a line that is “close” to all the data
  • Don't want interpolation – we know there is “error”
  • Balance between interpolation and generality

How wiggly is a function?

Animation of derivatives

Making wigglyness matter

  • Fit needs to be penalised
  • Something like:

\[ \int_\mathbb{R} \left( \frac{\partial^2 s(x)}{\partial x^2}\right)^2 \text{d}x\\ \]

  • (Can always re-write this in the form \( \mathbf{\beta}^T \mathbf{S} \mathbf{\beta} \))
  • Estimate the \( \beta_k \) terms but penalise objective
    • “closeness to data” + penalty (REML/ML)

Smoothing parameter

plot of chunk wiggles-plot

Sideline: GAMs are Bayesian models

  • Generally:
    • penalties are improper prior precision matrices
    • (nullspace gives improper priors)
  • Using shrinkage smoothers:
    • proper priors
    • empirical Bayes interpretation

Beyond univariate smooths?

plot of chunk tensor

  • Can build (anisotropic) tensor product terms
  • Take 2 or more univariate terms
  • Thin plate regression splines allow multivariate terms (isotropic)

Spatial smoothing

plot of chunk unnamed-chunk-3

  • Can just smooth in space
  • Valid abundance estimation technique
  • Useful for EDA for env. cov. models (hard day 2!)
  • Not good for extrapolations
  • Basis choice can matter!

Why GAMs are cool...

  • Fancy smooths (cyclic, boundaries, …)
  • Fancy responses (exp family and beyond!)
  • Random effects (by equivalence)
  • Markov random fields
  • Correlation structures
  • See Wood (2006/2017) for a handy intro

Let's fit a model

library(dsm)
# environmental covariates
dsm_env_tw <- dsm(count~s(Depth) + s(NPP) + s(SST),
                  ddf.obj=df_hr,
                  segment.data=segs, observation.data=obs,
                  family=tw())
# space
dsm_xy_tw <- dsm(count~s(x, y), ddf.obj=df_hr,
                segment.data=segs, observation.data=obs,
                family=tw())

dsm is based on mgcv by Simon Wood

Simple! Done?

NO

More on model checking later...

Predictions/abundance estimates

plot of chunk predplot

  • Grid of covariates must be available
    • Predict within survey area
    • Extrapolate outside (with caution)
  • Working on a grid of cells
  • Plot is s(x,y) + s(Depth)
  • Add up to get abundance

Estimating variance

  • Uncertainty from:
    • detection function parameters
    • spatial model
  • Need to propagate uncertainty!
    • Methods in dsm
    • Bravington, Hedley & Miller (in prep)

Plotting uncertainty

plot of chunk vars

  • Maps of coefficient of variation
  • CV for given stratum (better)
  • Visualisation is hard

Communicating uncertainty

  • Are animations a good way to do this?
  • Simulate from posterior parameter distribution
  • \( \boldsymbol{\beta} \sim N(\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\Sigma}}) \)
  • Some features (e.g. shelf, N-S gradient) stick out

I am going to stop talking very soon

Summary

  • Build models in stages (detection function + GAM)
  • Counts are functions of covariates
    • Pure spatial models
    • Environmental covariate models
    • Mix?!
  • Fit/check using dsm
  • Most of the theory is resolved, applications are hard

Part III

H-T or spatial or give up?

Spatial models can help

  • Spatial modelling can give ubiased abundance ests
    • even with uneven coverage
    • limits to extrapolation
  • V. even coverage => HT?
  • “Evenness” subtle, detectability effect
    • e.g., weather bad in east

Weather or distribution?

  • Weather has a big effect on detectability
  • Need to record during survey
  • Disambiguate between distribution/detectability
  • Potential confounding can be BAD

Visibility during POWER 2014

Thanks to Hiroto Murase and co for this data!

Covariates can make a big difference!

Other stuff

Spatial modelling won't solve all yr problems

  • Design issues
    • Ludicrous extrapolation
    • Survey plan not robust to weather issues
    • Non-uniform distribution wrt sampler
    • Migration

Spatial models alone can't solve these issues

Spatial modelling won't solve all yr problems

  • Violations of survey procedure
    • Following animals
    • Responsive movement
    • Guarding the trackline
    • Group size estimation

Spatial models alone can't solve these issues

Spatial modelling won't solve all yr problems

  • Detection functions
    • Not enough observations
    • Uncertain species ID
    • Group size

Spatial models alone can't solve these issues

Should everything be spatial?

  • Do you have enough observations?
  • If they do look good (even coverage, etc)
  • Is it worth re-analysing from H-T?
  • Point estimates similar?
  • Variance may well be different?

I am going to stop talking very soon

Summary

  • Spatial models don't solve all problems
  • Complex models can lead to complex issues
  • Recording weather conditions is important
  • You can always give up!

Part IV

Testing designs

What can we do?

  • Take a survey and simulate
  • Is H-T robust?
  • How do different spatial models compare?
    • Only thinking about total abundance & CV

Software

  • ltdesigntester github.com/dill/ltdesigntester
  • (based on DSsim by Laura Marshall, CREEM)
  • Setup simulations, test what can be done
  • Most of the work needs to be done in GIS
    • Survey shapefiles, covariates etc
    • Import to R, runs models, shows output

Setting up a survey simulation

Density

  • Grid in polygon of study area
  • Either specify simple gradient or use other tools to make complex density
  • Density as grid

Design

  • Generate using GIS/Distance
  • Export to shapefile

Detection function

  • Functional form (half-normal, hazard-rate)
  • Parameters (scale, shape)
  • Truncation
  • (Covariates via multiple functions, more later)

Specification to simulation

  • Generate multiple realizations
  • Analyse each with a many models
    • Different spatial, H-T
  • Compare results

Test models

  • Spatial smoothers
    • thin plate spline, bs="tp" (Wood, 2003)
    • thin plate spline with shrinkage, bs="ts" (Marra et al., 2011)
    • Duchon spline, bs="ds", m=c(1, 0.5) (Miller et al., 2014)
    • tensor of thin plate spline (w/ and w/o rotated covariates)
  • Stratified estimates
    • Horvitz-Thompson (w/ and w/o covariates)
    • stratified Horvitz-Thompson (w/ and w/o covariates)

Comparing performance

Important caveats

  • No model checking
  • Dependent on “good” detection specs
  • No group size model
  • No \( g(0) \) or availability

Quick example code

library(ltdesigntester)
# setup a simulation
my_sim <- build_sim(design_path="path/to/shp",
                    dsurf=density_surface_matrix,
                    n_grid_x=dsurf_dim_x,
                    n_grid_y=dsurf_dim_y,
                    n_pop=true_N,
                    df=detection_function_specs,
                    region="path/to/shp")
# run it!
res <- do_sim(nsim=number_of_sims,
              scenario=my_sim,
              pred_dat=prediction_data_frame, ...)

We made a big deal about weather earlier...

  • We can add covariates too (a wee bit unwieldy at the moment)
  • Build multiple detection functions/sims in list()
  • Covariates vary according to:
    • logit function E-W (can set pars, 2 state)
    • set values in segment data (already observed)

I am going to stop talking very soon

Summary

  • We can test multiple (simple) scenarios
    • Assumption of simple gradients
    • Models likely won't work for difficult stuff if they don't work for simple things
  • What will work/what won't
  • Simple summary plots
  • Better than the rest \( \neq \) good

Part V

Model checking for DSMs

Model checking

  • Count distribution
  • Basis complexity
  • Model (term) selection
  • Sensitivity
  • Observed vs. expected
  • Cross-validation (replicability)

(Plus all the usual stuff for detection functions!)

Count distributions

plot of chunk countshist

  • Response is a count
  • Often, it's mostly zero
  • Aggregations occur at scales smaller than spatial model
  • Want response distribution that deals with that
  • Could mess-up variance if ignored
  • Linked to segmenting
  • Flexible mean-variance relationship

Negative binomial

plot of chunk negbin

  • \( \text{Var}\left(\text{count}\right) = \) \( \mathbb{E}(\text{count}) + \kappa \mathbb{E}(\text{count})^2 \)
  • Estimate \( \kappa \)
  • Is quadratic relationship a “strong” assumption?
  • Similar to Poisson: \( \text{Var}\left(\text{count}\right) =\mathbb{E}(\text{count}) \)

Tweedie distribution

plot of chunk tweedie

  • \( \text{Var}\left(\text{count}\right) = \phi\mathbb{E}(\text{count})^q \)
  • Common distributions are sub-cases:
    • \( q=1 \Rightarrow \) Poisson
    • \( q=2 \Rightarrow \) Gamma
    • \( q=3 \Rightarrow \) inverse-Gaussian
  • We are interested in \( 1 < q < 2 \)
  • (here \( q = 1.2, 1.3, \ldots, 1.9 \))

Basis complexity

  • Before: \( s(x) = \sum_{k=1}^K \beta_k b_k(x) \)
  • How big should k be? “Big enough”
  • Penalty takes care of the rest
  • ?gam.check gives useful output
    • (also residual checks etc)

gam.check text output

gam.check(dsm_env_tw)

Method: REML   Optimizer: outer newton
full convergence after 8 iterations.
Gradient range [-3.139726e-08,2.036272e-08]
(score 375.9503 & scale 4.316452).
Hessian positive definite, eigenvalue range [0.5725432,298.5906].
Model rank =  28 / 28 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

            k'   edf k-index p-value
s(Depth) 9.000 4.049   0.814    0.36
s(NPP)   9.000 2.846   0.779    0.04
s(SST)   9.000 4.916   0.771    0.04

Tobler's first law of geography

“Everything is related to everything else, but near things are more related than distant things”

Tobler (1970)

Implications of Tobler's law

plot of chunk pairrrrs

What can we do about this?

  • Careful inclusion of terms
  • Test for sensitivity (lots of models)
  • Fit models using robust criteria (REML)
  • Test for concurvity (mgcv::concurvity, dsm::vis.concurvity)


Term selection

  • (approximate) \( p \) values (Marra & Wood, 2012)
    • path dependence issues
  • shrinkage methods (Marra & Wood, 2011)
  • ecological-level term selection
    • which biomass measure?
    • include spatial smooth or not?

Observed vs. expected

  • Diagnostic – compare observed vs. expected counts
    • Compare for different covariate/aggregations
  • In next dsm, obs_exp() does this
  • Going back to those rough POWER models…
> obs_exp(b, "beaufort")
               1        2       34
Observed 3.00000 10.00000 80.00000
Expected 6.97715 12.42649 83.03773

> obs_exp(b_nc, "beaufort")
                1        2       34
Observed 3.000000 10.00000 80.00000
Expected 8.478759 17.00705 73.23535

Cross-validation

  • How well does the model reproduce what we saw?
  • Leave out one area, re-fit model, predict to new data
  • Wenger & Olden (2012) have good spatial examples

Cross-validation example

Cross-validation example

I am going to stop talking very soon

2 (or more)-stage models

  • Not “cool” (statistically), but…
  • Multi-stage models are handy!
  • Understand and check each part
  • Split your modelling efforts amongst people

Conclusions

  • This methodology is general
    • Bears, birds, beer cans, Loch Ness monsters…
  • Models are flexible!
    • Linear things, smooth things, random effect things (and more)
  • If you know GLMs, you can get started with DSMs
    • Mature theoretical basis, still lots to do
  • Active user community, active software development

Resources

Thanks!

Slides w/ references available at converged.yt

References

Fewster, R.M., Buckland, S.T., Burnham, K.P., Borchers, D.L., Jupp, P.E., Laake, J.L., et al. (2009) Estimating the Encounter Rate Variance in Distance Sampling. Biometrics, 65, 225–236.
Fewster, R. M. (2011), Variance Estimation for Systematic Designs in Spatial Surveys. Biometrics, 67: 1518–1531.
Hedley, S. L., & Buckland, S. T. (2004). Spatial models for line transect sampling. Journal of Agricultural, Biological, and Environmental Statistics, 9(2).
Marques, T. A., Thomas, L., Fancy, S. G., & Buckland, S. T. (2007). Improving estimates of bird density using multiple-covariate distance sampling. The Auk, 124(4).
Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models. Computational Statistics and Data Analysis, 55(7).
Marra, G., & Wood, S. N. (2012). Coverage Properties of Confidence Intervals for Generalized Additive Model Components. Scandinavian Journal of Statistics, 39(1).
Wenger, S.J. and Olden, J.D. (2012) Assessing transferability of ecological models: an underappreciated aspect of statistical validation. Methods in Ecology and Evolution, 3, 260–267.

Handy awkward question answers

Don't throw away your residuals!

gam.check

plot of chunk unnamed-chunk-5

rqgam.check (Dunn and Smyth, 1996)

plot of chunk unnamed-chunk-6

Penalty matrix

  • For each \( b_k \) calculate the penalty
  • Penalty is a function of \( \beta \)
    • \( \lambda \beta^\text{T}S\beta \)
  • \( S \) calculated once
  • smoothing parameter (\( \lambda \)) dictates influence

How wiggly are things?

  • We can set basis complexity or “size” (\( k \))
    • Maximum wigglyness
  • Smooths have effective degrees of freedom (EDF)
  • EDF < \( k \)
  • Set \( k \) “large enough”

Let's talk about detectability

Detectability

Distance sampling

  • “Fit to the histogram”
  • Model:

\[ \mathbb{P} \left[ \text{animal detected } \vert \text{ animal at distance } y\right] = g(y;\boldsymbol{\theta}) \]

  • Calculate the average probability of detection:

\[ \hat{p} = \frac{1}{w} \int_0^w g(y; \boldsymbol{\hat{\theta}}) \text{d}y \]

Distance sampling (extensions)

  • Covariates that affect detectability (Marques et al, 2007)
  • Perception bias (\( g(0)<1 \)) (Burt et al, 2014)
  • Availability bias (Borchers et al, 2013)
  • Detection function formulations (Miller and Thomas, 2015)
  • Measurement error (Marques, 2004)  

Figure from Marques et al (2007)

That's not really how the ocean works...

Availability

We can only see whales at the surface

  • What proportion of the time are they there?
    • Acoustics
    • Tags (DTAGs etc)
    • Behavioural studies
  • Fixed correction to \( \hat{p} \)?
  • Model via fancy Markov models (Borchers et al, 2013)

Picture from University of St Andrews Library Special Collections